song.yz@foxmail.com wechat: math-box

统计机器学习



朴素贝叶斯方法





朴素贝叶斯方法提出了一个简单的假设,既所有属性是彼此独立的。这样可以大大简化分类器的运算量。 然而,通过各种场景的分类测试,朴素贝叶斯往往却出人意料的高效。

朴素贝叶斯其实是一组建立在贝叶斯理论之上监督学习方法。贝叶斯理论满足下面的关系对于变量\(y\) 依赖于向量\(x_1\)到\(x_n\) \[P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots, x_n \mid y)} {P(x_1, \dots, x_n)}\] 根据独立性的假设 \[P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y),\] 对所有的\(i\)简化为 \[P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)} {P(x_1, \dots, x_n)}\] 由此分类规则为 \[ \begin{align}\begin{aligned}P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)\\\Downarrow\\\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y),\end{aligned}\end{align} \]

对高斯分布的概率函数,独立性意味着似然可以分解为为概率乘积 \[P\left( {{\mathbf{x}}|{c_i}} \right) = P\left( {{x_1},{x_2}, \cdots ,{x_d}|{c_i}} \right) = \mathop \Pi \limits_{j = 1}^d P\left( {{x_j}|{c_i}} \right)\] 则有 \[\begin{gathered} P\left( {{\mathbf{x}}|{c_i}} \right) = \frac{1}{{{{\left( {\sqrt {2\pi } } \right)}^d}\sqrt {\mathop \Pi \limits_{j = 1}^d \sigma _{ij}^2} }}\exp \left\{ { - \mathop \Sigma \limits_{j = 1}^d \frac{{{{\left( {{x_j} - {\mu _{ij}}} \right)}^2}}}{{2\sigma _{ij}^2}}} \right\} \hfill \\ \quad \quad = \mathop \Pi \limits_{j = 1}^d \frac{1}{{\sqrt {2\pi } {\sigma _{ij}}}}\exp \left\{ { - \frac{{{{\left( {{x_j} - {\mu _{ij}}} \right)}^2}}}{{2\sigma _{ij}^2}}} \right\} \hfill \\ \end{gathered} \]